0. Introduction and setup
1. Create and Delete Table
2. Select, From, Where, Top
3. Inner Join
4. Other Joins
5. Alias
6. And, Or, In, Not, Between, Like, Case
7. Union, Union All
8. Data Types
9. Dates
10. String Functions
11. Subtotal, Count, Group, Having
12. Window Functions
13. Subqueries, CTEs, Temp Tables, Table Variables
14. Delete, Truncate, Update, Alter, Select Into
15. Null Values
16. Primary Key and Foreign Key
17. Indexes
18. Execution Plans
1. Create and Delete Table
To create a table and insert data into the table, first need to create a database
in which to store the data. To create a database, use “create database databasename;”,
where “databasename” is the desired name of the database. For example, using the name
“ex” for the database:
[text]
When the command successfully executes, a message will display indicating that the
database has been created.
Once a database is created, need to use the command “use databasename;” to begin
working with the database:
[text]
A table is created using the command “create table tablename
(columnname1 columntype1, columnname2 columntype2, columnname3 columntype3);”,
where “tablename” is the desired name of the table, and the “columnnames” and “columntypes”
are the desired column name and data types of the columns. A table can have more than
three (and up to 1,024) columns.
[text]
The data type determines which type of data the column can hold. For example, ”int” designates that a column to be “integer”, which means the column can contain whole number values between -2,147,483,648 and 2,147,483,647. “varchar(30)” designates that the column can contain up to 30 characters (letters, numbers, symbols). For more information on data types, see section 8.
After a table is created, data can be loaded into the table using
“insert into tablename (columnname1, columnname2, columnname3) values
(row1column1value, row1column2value, row1column3value),
(row2column1value, row2column2value, row2column3value),
(row3column1value, row3column2value, row3column3value);
[text]
A message will display indicating the number of rows in the table affected.
(Three rows of data were inserted, so three rows were affected.)
Delete Table
To delete a table, use "drop table tablename":
[text]
Result:
If you try selecting data from the table, an error message will return,
as the table no longer exists:
2. Select, From, Where, Top
To view the data that is in a table, use the command “select * from tablename”:
[text]
Gives:
The asterisk (*) in the query tells SQL Server to return all columns of data
in the table. Alternatively, desired column names to include in the query results
can be listed after the select keyword, as in the command
“select columnname1, columnname2, columname3 from tablename”:
[text]
The query again gives:
Not all columns in the table need to be listed:
[text]
Gives:
Columns can be listed in any order:
[text]
Gives:
Where
The queries so far have returned all rows in the given table. To select only
certain rows, can use the “where” keyword, followed by some criteria, such as
“columnname = number”, to find only rows in the table where the “columnname”
column equaled the “number”.
[text]
Gives:
The criteria being searched for is called the search condition. A search condition is composed of one or more predicates. A predicate is an expression that evaluates to true, false, or unknown.
The search condition can also be “columnname = ‘word’”, where the “word”
is some word to be matched:
[text]
Gives:
Top
To select a limited number of search results meeting a given criteria, can
use “Top” keyword. The following example will return one of the two results in
the Customers table with LastName of Smith.
[text]
Gives:
“Top” is useful for getting sample results from a table without having to
return all data. The usage here is just an example, but “Top” would be more
useful in a table with millions of rows.
3. Inner Join
To combine data from multiple tables, will use “joins”. To see how this is used,
first create another table:
[text]
[text]
Selecting all data from the Orders table gives:
Suppose we want the name of a customer for a given order number, for example, OrderID 13.
We could:
[text]
We see that CustomerID is 2, so we can look this up in the Customers table and see that
the customer name is Jane Smith:
[text]
It is possible, however, to combine these queries together. Both tables can be put in the"from"
clause of the query, separated by a comma, and then a condition can be added to the where clause
specifying that the CustomerID in the Customers table must equal the CustomerID in the Orders table:
[text]
Combining two tables together to query from them is called joining the tables, and the condition specifying that a field from one table must equal a field from the other table is called a join condition.
There are multiple types of joins. The previous example shown is called an inner join, specifically.
The join condition can be specified in the where clause as shown in the previous example,
but it can also be specified in the from clause. As the other types of joins besides
inner join use the from clause only, many people prefer to put all inner joins in the
from clause as well. In this case, rather than separate the tables with a comma, the keywords
“inner join” are put between the tables, and then following the second table’s name is put
the keyword “on”, followed by the join condition:
[text]
As can be seen, this query gives the same result as the previous example:
Inner join is the default type of join for the join keyword, and can be omitted, so the
following query will again give the same result:
[text]
A possible question may be, why separate the data into two different tables, instead of
putting all data in a single table, as in the following:
[text]
The answer is that if all data were combined into a single table, then there would be a lot
of repetition of the customers’ names. When there is only fifteen rows of data as in the example,
that doesn’t make much of a difference. But if this table had millions or billions of records,
the space savings would add up. Additionally, if you were storing other information about the
customer, such as phone number, address, email address, etc., you would have more repetition
per order when a customer made multiple orders.
4. Other Joins
Suppose we have a slight alteration of the first example, where all orders belonging to
CustomerID 2 are changed to belong to CustomerID 4:
[text]
If we run the query again to try to find the customer name for OrderID 13, it returns no result,
as we did not add a CustomerID 4 to the customer table.
[text]
Suppose we wanted to still return the order information from the Order table, even if a corresponding
customer was not found in the Customer table. Can use a right join to return data found in the second
table of a join, even if no entry exists in the first table that satisfies the join condition:
[text]
Similarly, a left join will return data found in the first table of a join, even if no entry exists
in the second table that satisfies the join condition:
[text]
The functionality of a left join and a right join can be combined using a full outer join.
Supposing we wanted to return data about all customers and orders, we could use a full outer join
to include rows where the CustomerID matches between the Customers and Orders table, as well as
rows in either table that do not match the other:
[text]
Cross Join
Suppose we forgot to specify a join condition, and we executed the
following query:
[text]
As there is no condition to match rows in one table with the other table,
the result is a combination of each row in the Customers table with each row
in the Orders table. Because there are three rows and the Customers table and
15 in the Orders table, it results in 45 rows:
As an example of a more useful cross join, suppose we had a table of product
types and a table of colors.
[text]
If we did a cross join between the two tables,
we could see all combinations of products and colors:
[text]
As there were three products and four colors, there are twelve records in
the result:
5. Alias
To avoid having to write table names repeatedly throughout a query, it is possible to
assign abbreviations to the table names. These abbreviations are called aliases. In the
“from” clause of the query, an alias can be assigned by adding a space, then the keyword “as”,
then another space, and then the abbreviation, as in “tablename as alias”. For example:
[text]
As can be seen, this is equivalent to the results for:
[text]
The “as” keyword in the “from” clause can be omitted:
[text]
Aliases can also be used with column names. In this case, the abbreviation is used
to change how the column name is displayed in the query results.
[text]
As an example, suppose we want to add a space between the words in “FirstName” and “LastName”:
If we tried the following, an error is given:
[text]
To allow for spaces, we can enclose the multiple words within parentheses:
[text]
Gives:
Again, the “as” keyword can be omitted:
[text]
Instead of double quotes, single quotes can be used:
[text]
Brackets can also be used:
[text]
Column alias can also be added with equals sign
[text]
6. And, Or, In, Not, Between, Like, Case
And
To search for two criteria and require that the query results meet both criteria, use the "and" keyword.
For an example, add another customer to the Customers table:
[text]
The following query:
[text]
Gives:
More than two criteria can be specified and combined with the “and” keyword.
Or
To search for multiple criteria and require that the query results meet one of
the multiple criteria, use the “or” keyword.
[text]
Gives:
In
To search for records where a given column can equal any of multiple values
in a more concise way than using the “or” keyword, can use the “in” keyword.
To use the “in” keyword, use “columnname in (“value1, value2, value3”) where
columnname is the column to check for the values, and value1, value2, value3
are the values to be checked for:
[text]
Gives:
This is equivalent to the following:
[text]
which gives:
The example provided only had two values in the parentheses for the sake
of simplicity. For an example with more than two values:
[text]
Result:
Not
To return records that do not meet a criteria, use the “not” keyword
followed by the criteria in parentheses, as in:
[text]
Result:
As an alternative to the “not” keyword, can also put an exclamation point
in front of an equals sign to stand for “not equal”:
[text]
Result:
Between
To return records where a particular date column has a value two
particular dates, use the “between” keyword:
[text]
Result:
This is equivalent to using two separate criteria using “<=” and “>=”:
[text]
Result:
Like
Suppose we have an address field added to our Customer table.
[text]
Suppose we wanted to find all customers who live in a certain city. We cannot
use the following:
[text]
No results are returned, because SQL Server is looking for records where the
Address equals “Capital City”. To find rows where a field contains a certain
value, can use the Like keyword, along with what are called wildcard characters.
Like a wildcard in playing cards that can count as other cards, wildcard
characters can count as other characters when searching. A percent sign (%)
can count as zero, one, or more other characters. So, the following query will
return three results:
[text]
The percent sign is required before and after the words “Capital City”,
because there can be other characters before or after the city in the Address
field. Consider the following query
[text]
Gives:
In this example, only two results were returned, because although CustomerID 4, Jane Jones, has “Capital City, PA” in her address, there is no percent sign after “PA” in the query, and her Address includes her country of “USA” after her state.
Adding the percent sign after “PA” now returns CustomerID 4 as well:
[text]
Result:
An underscore (_) can be used to represent only a single character.
To search for a percent sign, it is necessary to escape the percent sign. This tells SQL Server to search for the percent sign instead of use it as wildcard. Characters are escaped by enclosing them in brackets, as in "[%]".
Case
SQL Server has the ability to check multiple criteria for a given row, and
then return different values depending on which criteria is met. This is done
using “case when criteria1 then value1 when criteria2 then value2 when
criteria3 then value3 else value4 end as resultcolumnname”, where the
criteria1, criteria2, and critera3 are the criteria to check; value1, value2
and value3 are the corresponding values to be returned; value4 is the value
to return if none of the criteria are met; and returncolumnname is the name
of the column in the query results.
[text]
Result:
More than three criteria can be checked.
The “else” part of the case statement is optional:
[text]
Result:
7. Union, Union All
Suppose it is desired to combine data from two different tables into a single result. To accomplish this, write separate query statements for each table that will return results with the same columns, and then combine the two statements by putting the “Union” or “Union All” keywords between them. Union returns results with duplicates removed, while Union All returns all results, including duplicates.
As an example, create another table to be combined with the Customers table:
[text]
Then:
[text]
Using “Union all”, it can be seen that the row with Jane Smith appears twice:
[text]
8. Data Types
Data types refer to the format the SQL Server saves the data of a column of a table in. Data can be numbers, text, or dates, for example, and there are different data types that correspond to these. Numbers, specifically, correspond to several different data types depending on how big the number is. For example, “tinyint” (int being short for integer) is used for numbers from 0 to 255, “smallint” is used for numbers from -32,768 to 32,767, and “int” is used for numbers from -2,147,483,648 to 2,147,483,647. Different columns of a given table can have different data types.
Data types can be of a fixed length or a variable length. Variable length data types have the advantage that SQL Server will only store the given data of a row of a column and not extra spaces, but have the cost of needing two extra bytes to store information about the length of that piece of data in a row of the column.
The goal of selecting a data type of fixed length for a column is to choose the smallest data type that will accommodate all of the given data that will be stored in a certain column. The smallest data type is preferable as it will use up less of the computer’s storage space, and allow for faster queries, as SQL Server has to search through less storage space. The data type of a column can be changed later to certain other data types, but this change will take time for SQL to make, especially as a table grows larger.
Looking at the three number data type examples shown prior, tinyint takes up one byte, smallint two bytes, and int four bytes. The range of possible values increases from tinyint to int, and the size in bytes of the data type increases as well, so we see the tradeoff- as larger data type can accommodate more values, but it has the cost of increased storage, hence the goal of choosing the small data type that will accommodate all of the given data that will be stored in a given column.
For alphanumberic data, there are the data types char, varchar, nchar, and nvarchar. Char and nchar are fixed length and varchar and nvarchar are variable length. A number n can be put in parentheses following the data type name which will define the number of bytes used, for example “char(8)” will use 8 bytes. Char and varchar can be used for common text and will hold n characters. Nchar and nvarchar can be used for the full range of Unicode characters and may hold n/2 or fewer characters, depending on the characters chosen. Varchar and nvarchar can also have “max” entered in their parentheses instead of a number, and in this case can hold approximately 2GB of data.
Common data types are listed in the following table:
| Data type | Range | Storage
| bigint
| -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807)
| 8 bytes
| int
| -2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647)
| 4 bytes
| smallint
| -2^15 (-32,768) to 2^15-1 (32,767)
| 2 bytes
| tinyint
| 0 to 255
| 1 byte
| money
| -922,337,203,685,477.5808 to 922,337,203,685,477.5807
| 8 Bytes
| smallmoney
| -214,748.3648 to 214,748.3647
| 4 Bytes
| char(n)
| 1-8,000 characters
| n Bytes
| varchar(n/max)
| For varchar n, 1-8,000 characters or for varchar max, 2^31-1 bytes (2 GB) of characters
| n + 2 Bytes or 2^31-1 bytes (2 GB)
| nchar(n)
| 1-4,000 characters
| 2 * n Bytes
| nvarchar(n/max)
| For nvarchar n, 1-4,000 characters or for nvarchar max, 2^30-1 bytes (2 GB) of characters
| 2 * n Bytes + 2 bytes or 2^30-1 bytes (2 GB)
| time
| 00:00:00.0000000 through 23:59:59.9999999
| 3-5 bytes depending on precision
| date
| 0001-01-01 through 9999-12-31
| 3 bytes
| smalldatetime
| date, 1900-01-01 through 2079-06-06; time 00:00:00 through 23:59:59
| 4 bytes
| datetime
| date, January 1, 1753, through December 31, 9999; time, 00:00:00 through 23:59:59.997
| 8 bytes
| datetime2
| date, 0001-01-01 through 9999-12-31; time, 00:00:00 through 23:59:59.9999999
| 6-8 bytes depending on precision
| datetimeoffset
| date, 0001-01-01 through 9999-12-31; time, 00:00:00 through 23:59:59.9999999; offset, -14:00 through +14:00
| 8-10 bytes depending on precision
| |
For a full list of data types, see the here.
9. Dates
Year
To return the year of a given date, use "year(date)".
[text]
Month
To return the month of a given date, use "month(date)".
[text]
Day
To return the day of a given date, use "day(date)".
[text]
Datepart
To return a certain part of a date, use “datepart(specifieddatepart, date)”, where specifieddatepart is the part of a date to return, and date is the date to return a part from. For example, specifying the datepart of “yy” from the date of “2019-12-01” will return the year of 2019.
The list of dateparts is the following:
| datepart | Abbreviations |
| year | yy, yyyy |
| quarter | qq, q |
| month | mm, m |
| dayofyear | dy, y |
| day | dd, d |
| week | wk, ww |
| weekday | dw, w |
| hour | hh |
| minute | mi, n |
| second | ss, s |
| millisecond | ms |
| microsecond | mcs |
| nanosecond | ns |
As an example:
[text]
Datename
Datename works like datepart, but instead of returning an integer
representing the specified datepart, it returns a character string
representing the specified datepart.
[text]
Result:
Datefromparts
To combine three values into a single date value, use
“datefromparts(year, month, day)”, where year specifies the year, month
specifies the month, and day specifies the day of the new combined date.
[text]
Timefromparts
To combine four values into a single time value, use
“timefromparts(hour, minute, second, fraction, precision)”, where hour
specifies the hour, minute specifies the minute, second specifies the
second, fraction specifies the fraction of a second, and precision is
the number of decimal places of the new combined time.
[text]
Datetimefromparts
To combine seven values into a single datetime value, use
“datetimefromparts(year, month, day, hour, minute, second, milliseconds)”
(recall that datetime is rounded to increments of .000, .003, or .007 seconds):
[text]
Getdate
To return the current date and time, use “getdate()”:
[text]
Current timestamp
Current_timestamp works the same as getdate(). Current_timestamp is the
American National Standards Institute (ANSI) SQL version of the function.
(There are other versions of the SQL language besides the Microsoft version
as in SQL Server. ANSI SQL is a popular standard version of the language that
some versions of SQL may use as a guide to be more compatible with other
versions of SQL.)
[text]
Dateadd
To add a particular number of a particular datepart to a particular date,
use “dateadd(datepart, number, date)":
[text]
Result:
Datediff
To find the difference between two dates, use
“datediff(datepart, startdate, enddate)” where datepart is the datepart
in which to measure the difference, startdate is the earlier of the two
dates, and enddate is the later of the two dates.
[text]
Result:
EOMonth
To return the last day of the month based on a given date, use
"eomonth(date)". Note that this function only works in SQL Server 2012 and
later versions.
[text]
Result:
10. String Functions
Concat
To combine data form multiple columns and/or multiple hard-coded values into a single column
in a query result, use “concat(columnname1/value1, columnname2/value2, columnname3/value3)”.
[text]
Using an alias to name the new column, we have the following:
[text]
Another way to concatenate fields is to use a plus sign:
[text]
Left
Left(string, length) will return the leftmost "length" number of characters from "string".
[text]
Right
Right(string, length) will return the rightmost “length” number of characters from “string”.
[text]
Len
Len(string) returns the number of characters in “string”, excluding spaces at end of the string.
[text]
Len() can be useful if you are researching new data and want to learn more about the types of values that can appear in the field. If you are creating a new table, based on a prior table, want to ensure all existing values will fit in the new table, and can use len() to find the largest value in a given column.
Substring
Substring(string, start, length) returns length number of characters from string beginning with the position of the start-th character.
As an example, suppose you have customer phone numbers and you want to remove the country code:
[text]
Selecting all from the Customers table to see what the data looks like:
The following query:
[text]
Returns results with the country code removed:
Ltrim, Rtrim, Trim
Ltrim(string) removes spaces from the start of a string. Rtrim(string) removes spaces
from the end of a string.
[text]
Selecting all:
Using Ltrim() and Rtrim():
[text]
SQL Server 2017 and later has the function Trim(string) which functions the same as Ltrim(Rtrim(string)).
The trim functions are useful if, for example, you are working with values created by a user’s input, and they have accidentally entered spaces at the beginning or end of what they intended to enter.
Replace
Replace(columnname, string1, string2) replaces any occurrences of string1 in columnname1 with string2.
As an example, suppose we had a table with customer phone numbers, and we wanted to
remove the hyphens:
[text]
The following query:
[text]
Gives:
Additional string functions
Additional string functions can be found in the
Microsoft documentation page on string functions
11. Count, Subtotal, Group, Having
Count
Using tables from first example
[text]
As before, an alias can be used on the count column:
[text]
Max
[text]
Min
[text]
Sum
[text]
Having
Suppose we only wanted to return certain results for records where one of the aggregations (i.e. Count, Max, Min, Sum) meet a certain criteria. Instead of putting the criteria in the “Where” clause, need to use the “Having” keyword.
To return Customers who have orders with a total amount greater than $2,000:
[text]
12. Window Functions
SQL Server has built in functions to add numbers to rows in results based on their order in the results. This is used to rank results, for example, to find the largest orders or to find the customers who have made the largest orders during a particular time period. The two most common functions used are row_number and rank. Row_number adds numbers to rows based on their order in the result set while rank adds numbers to rows but puts the same number for ties (e.g. 1,2,3,4,4,6 instead of 1,2,3,4,5,6 if rows 4 and 5 have the same value for the given column being sorted on).
Row Number
Row_number() requires an “over” clause to be used. An over clause indicates to the column or columns to use to sort the results. The default is to order the results in ascending order. Often it may be desired to find the greatest value and rank it with the lowest number, for example, the largest order be ranked 1, so it is necessarily to specify “desc” so that the results are ordered in descending order.
As an example, the following query ranks the Orders in the order table by Amount,
with the highest Amount being ranked 1.
[text]
Result:
Rank
Change the data in the Order table slightly to demonstrate the Rank
function:
[text]
Running the previous query with Row_Number, we see that although
OrderIDs 6 and 10 have the same amount, they are assigned different numbers:
[text]
Result:
To show OrderIDs 6 and 10 as a tie, can use the Rank() function:
[text]
Result:
13. Subqueries, CTEs, Temp Tables, Table Variables
Some queries require multiple select statements. There are four methods to execute such queries- subqueries, CTEs, Temp Tables, and Table Variables.
Suppose it was desired to find the top 5 orders using the row_number function.
The following query doesn't work:
[text]
The following query also does not work:
[text]
Subquery
A subquery is a full query in itself, that is placed inside of another query. The subquery is enclosed inside of parentheses and then referenced inside of a from clause or where clause. If putting the subquery in a from clause, an alias is required.
A query that satisfies the above example of finding the top 5 orders
using the row_number function:
[text]
Result:
CTE
In a CTE, the queries used to build the final query are placed before it,
again inside of parentheses, but also preceded with the “with” keyword, as
well as a name for each query and the “as” keyword. Then, the name of the
queries in the “with” clause are used to reference them in the final query:
[text]
Result:
Temp Table
A temporary table (temp table) is a table that is stored in SQL for a limited amount of time. It lasts until a user deletes it or ends their connection. To designate a table as a temporary table, prefix its name with a number sign (#).
To satisfy the above example of finding the top 5 orders using the
row_number function, using a temporary table can write one query to
select data into a temporary table, and then another query to return
records from the temporary table:
[text]
Result:
Table Variable
The fourth method is to use a table variable. For information
on table variables, see
here.
14. Delete, Truncate, Update, Alter, Select Into
Delete
To delete data from a table without deleting/dropping the table itself,
can use “delete from tablename”:
[text]
Result:
If we try selecting all data from the Customers table:
[text]
We can see that the table still exists, but there is no data in the table:
A where clause can also be used with Delete.
Truncate
Truncate is like delete, but cannot include a “where” clause. Truncate is
also faster and uses less system resources than delete.
[text]
Result:
Update
To update data in an existing record in a table, can use the “update tablename set fieldname = ‘value’”:
Starting with the Customers table:
The following query will change the name of CustomerID 1 from "John" to "Tim":
[text]
Query returns the following message:
Selecting all data from the Customers table gives:
Alter
To change a table by adding a column, removing a column, or changing a columns datatype, can use the “Alter” keyword.
To add a column, use “Alter table tablename add columnname columntype”.
As an example, suppose we wanted to add a column to the Customers table for Phone Number.
Starting with the Customers table from the original example:
The following query will add a new column with data type of char(15):
[text]
Query returns the following message:
Selecting all data from the table gives:
There is no data in the table yet for Phone Number. It can be added by
deleting the existing data and then inserting the following:
[text]
Selecting all data from the table then gives:
To remove a column, use “Alter table tablename drop column columnname”.
To change a column’s datatype, use “Alter table tablename alter column columnname newcolumntype”.
Select Into
Suppose you wanted to create a new table with data from another table. This can be done with “select * into newtablename from tablename”.
As an example, suppose you wanted to make a backup of the Customers table.
[text]
Selecting all data from the new table:
[text]
Gives:
15. Null values
Null is used to indicate that a value is unknown. When inserting data into a row, Null
can specifically be inserted, or will be automatically filled in for any columns where no
value is inserted.
[text]
Selecting all data from the table:
As an example of null values being added to a row as a default, consider a table with the
PreferredName column where it is left out of the insert statement:
[text]
Selecting all values gives:
It is possible to prohibit a column from having any Null values. This can be done by
adding “not null” after the datatype for a desired column when creating a table:
[text]
Trying to insert data with NULLs in the PreferredName column:
[text]
Gives:
Returning to the first example in this section, suppose we wanted to return records
for Customers that did not have a preferred name. We cannot check for “PreferredName = null”.
(Because null is unknown, any comparison to null is also considered to be unknown, so no
rows are returned.)
[text]
Instead, use the keywords “is null” to check for nulls:
[text]
Use “is not null” to return rows where a given column is not null:
[text]
Also note, can use:
[text]
Isnull
Suppose we do not want to see the null values in the query results. Can use
“Isnull(columnname1, value1/columname2)”. Isnull() checks “columnname1”, and if it is null,
it will return value1/columnname2, where value1/columnname2 is either a value or a column,
and if it is not null, it will return columnname1. This can be used to return empty values
instead of nulls by using “Isnull(columnname, '')”.
[text]
Suppose though we only want one field returned for the first name, and we want the customer’s
preferred name if they have one, and otherwise we want their first name. This is accomplished
using the coalesce() function, as in “isnull(columnname1, columnname2)”, which will return
columnname1 if it contains data (i.e. is not null), and otherwise will return columnname2.
This example will return the FirstName for John and Jane, but will replace Robert with the
PreferredName of Bob:
[text]
Coalesce
The coalesce() function is similar to the isnull() function, except that coalesce() can be used with more than two columns, as in “coalesce(columnname1, columnname2, columnname3)”, which will return the first columnname that has data.
For the sake of an example, suppose there were some customers you weren’t on a first name basis
with, and you had a title column for these customers.
[text]
Selecting all from Customers just to see what the data looks like:
Then:
[text]
16. Primary Key and Foreign Key
There is nothing in our examples so far that prevents us from having two customers with the same CustomerID.
Consider the first example, but suppose the CustomerID for Bob Jones was entered as a 3
instead of a 2.
[text]
If we try to find the customer name associated with OrderID 13, it gives two results:
[text]
To prevent this, we can make the CustomerID column on the Customers table what is called
a primary key. Designating a column or set of columns as a primary key forces the value in
that column or set of columns to be unique for every row of the table. A column is designated
as a primary key by creating a primary key constraint on the column.
[text]
If we then try to insert two records with the same CustomerID, we receive an error:
[text]
Suppose we have already designated a column as a primary key, but want to make another column be unique as well. This is done by adding the keyword “unique” after the data type for the desired column when creating a table.
As an example, suppose we wanted to add a column for the customer phone number, and wanted it
to be unique for each person.
[text]
If we then try to
[text]
Adjusting the phone number for Bob Jones, we have a valid insert:
[text]
A difference to note between a primary key constraint and unique constraint is that a unique constraint will allow one row with a null value, while a primary key will not allow a Null value. (Nulls can be prevented in a column with a unique constraint by designating it “not null”.)
To add a unique constraint after the table is already created
[text]
To remove a constraint
[text]
The constraint name can be found looking in the Object Explorer:
It can also be found using the following query, substituting the desired tablename in
for “Customers”:
[text]
Suppose we want to ensure that no orders can be added to the Orders table with a CustomerID that
doesn’t exist in the Customers table. In this case, we need to create a Foreign Key on a column in the
Orders table to the CustomerID column Customers table. Creating a foreign key on a column in one table
to another column in another table means before any value can be added to the first column, it must
already exist in the second column. (Typically, the column in the second table will either be a primary
key or have a unique constraint on it, otherwise if a NULL value is entered in the second column, a NULL
value could accidentally be entered into the first table.) This would at It is said that the foreign
key from the first table references the column in the second table. This is done when creating a table
by adding “foreign key (columnname1 in tablename1) references tablename2 (columnname2)”, where
columname1 refers to the column the foreign key is being placed on, tablename1 is the name of the
table being created, tablename2 is the table being referenced, and columname2 is the column being
referenced in tablename2.
[text]
If we try to insert rows to the table for which there is no matching CustomerID in the Customers table,
SQL server will return an error.
[text]
17. Indexes
The idea of an index is to store a copy of a column or columns from a table, with one or more of these columns sorted, along with the location of where a given row in the index can be found in the original table. Like an index of a book helping a reader locate a certain page, indexes enable SQL Server to locate a given row more quickly.
By sorting a table itself in a certain way, SQL Server can avoid having to save a copy of those columns to create an index. When the table itself is sorted and declared to have an index stored in the table itself, that index is called a clustered index, and the table is called a clustered table. The table itself can only be sorted in one way, and so a table can only have one clustered index.
Additional indexes, which will be new copies of a column or columns, are called nonclustered indexes.
The columns to be sorted are called key columns, and other columns added to the index are called included columns. The reason to add more columns to the index that those that are sorted, is that if all columns used in a query are included in an index, SQL Server will not have to take the location of a row given in the index and look up the row in the original table, it can just take the value of each of the columns from the index. When all columns from a query are contained in an index, it is said that the index covers the query, and the index can be referred to as a covering index of the query.
When a primary key is placed on a table, the default in SQL Server is to create a clustered index on that column.
As an example, create the Customers table with a primary key on the CustomerID:
[text]
Viewing the Object Explorer will show that an index has been created:
To create a nonclustered index:
[text]
To remove a nonclustered index:
[text]
The command “create index” will default to creating a nonclustered index, so “create index” is
equivalent to “create nonclustered index”:
[text]
To create a unique index:
[text]
To define a clustered index on a column that is not the primary key, first create the primary key
as a nonclustered index, and then create a clustered index on the non-primary key column
[text]
18. Execution Plans
To view in more detail the steps SQL Server takes when executing a query, it is possible to view
the query’s execution plan. To have SQL Server display the query execution plan along with the query
results, click the “Include Actual Execution Plan” button on the top menu. (This option can also be
turned on and off with the keyboard shortcut Ctrl + M):
Looking at:
[text]
We see a new tab returned alongside the query results:
An execution plan is composed of a set of operators, also called iterators, which are linked together in a tree format. Data in the tree flows from right to left and from bottom to top.
In this example, there are only two operators. The following example is more complex:
[text]
In addition to the graphical show plan, it is also possible to view a text version of the plan.
[text]
Re-running the previous example
[text]
Gives:
Note that if Showplan_text is turned on, the graphical plan will not be returned.
Showplan_text can be turned off using the command “set showplan_text off”:
[text]
It is also possible to see the query plan in XML (Extensible Markup Language). For further information, see here .
In addition to viewing the actual execution plan, it is possible to view an
estimated execution plan. This is useful to see the plan for a query that may take
a long time to execute without having to run the query.
To have SQL Server display the estimated execution plan, click the “Display Estimated Execution Plan” button on the top menu. (The estimated execution plan can also be explained with the keyboard shortcut Ctrl + L):
Viewing the execution plan of the previous query:
[text]
Scan and Seek
Two of the most common types of operators are scans and seeks. A scan returns an entire table or index. A seek returns “rows from one or more ranges of an index”.
Table Scan
Starting with our basic Customer table example, suppose we are selecting all data in the
Customers table.
[text]
Of course, since the entire table is being selected, SQL Server has to scan the entire Customers table to return the result.
Suppose we only wanted to select one row from the Customer table, where the CustomerID
was 2.
[text]
SQL Server still needs to do a Table Scan. As there is no primary key or unique
constraint on the CustomerID column, SQL Server doesn’t know it can stop scanning once
it finds the single row where the CustomerID is 2:
Clustered Index Scan
Suppose now we add a primary key to the Customers table:
[text]
If we select all rows from the table:
[text]
A Clustered Index Scan is now done. Although we now have a primary key on the
CustomerID column, a scan is still done because all rows must be returned. But because
we added the primary key, it made the table a clustered index, and so a Clustered Index Scan
is done instead of a Table Scan:
Clustered Index Seek
Now suppose we only want to return the row for CustomerID 2.
[text]
A Clustered Index Seek is now done. Because there is primary key on the CustomerID
column, all the rows are in order by the CustomerID, and so when CustomerID 2 is found,
the table can stopped being searched:
Index Scan
To see an example of a (non-clustered) index scan, create a non-clustered index:
[text]
Then run a query selecting on that column:
Result:
Index Seek
Following the previous example, run the following query for an example of index seek:
[text]
Result:
As a more useful example (since there wouldn’t be much point in just returning a
result with the same name that was in the where clause, create a new index using
LastName and FirstName), change the index to also use FirstName:
Then run the following query:
Result:
Join operators
There are three types of join operators- merge join, hash join, and nested loops join.
Merge Join
The join operator that requires that most specific conditions, and is often the fastest join operator, is the merge join. Merge join requires an equijoin predicate and for both tables to be sorted on the join keys (except in the case of a full outer join). (An equijoin predicate is a join predicate using only an equal sign, and not using the inequality operators > or <.) The cost is proportional to sum of number of rows in tables.
Merge join works as follows. Start with the first row of the first and
second tables. If rows satisfy the join predicate, include their join in the
operator output, and move on to the next row in the second table. Continue
checking the join predicate, including the joined rows in the output and
moving on to the next row in the second table until the two rows no longer
satisfy the join predicate. When they no longer satisfy the join predicate,
move on to the next row in the first table. If the next row in the first
table does not satisfy the join predicate with the current row in the second
table, check if the join value is greater in the first or second table.
Whichever table currently has the greater join value, move on to the next row
in the other table. Continue this process until the ends of both tables are
reached. Pseudo-code for merge joins is given by the following:
Source:
https://docs.microsoft.com/en-us/archive/blogs/craigfr/merge-join
I haven't found a simple useful example yet, but to see what merge join
looks like in the output, can tell SQL Server to use a merge join by using
a merge join hint, which can override what SQLServer would choose.
Links to the documentation for query hints are in the below "Hints"
section. Using the Customers and Orders table from the most basic join
example:
[text]
Hash Join
Hash join requires an equijoin predicate, but does not require sorted inputs. It is useful for the largest joins. A hash aggregate requires all rows in one of its input tables to be stored in memory. If there is not enough available memory, then extra rows must be stored in tempdb.
A hash join uses a hash function and hash table. A hash function is a function that transforms its inputs into outputs called hash indexes or hash values or hash codes. A hash table is divided into buckets, where each hash index or hash code corresponds to a bucket of the hash table, and thus gives the location of the bucket in the hash table.
For a hash join, first hash values are calculated for all rows in one of the input tables. This table is called the build table. From the hash value, the row is inserted into the corresponding hash bucket of the hash table. Next, for each row in the other input table, called the probe table, the hash value is calculated. This will provide a corresponding hash bucket. It is possible for multiple input values to be mapped by the hash function to the same hash bucket, so the row from the probe table is compared to each row in the hash bucket from the build table. If the row from the probe table matches the row from the build table, the rows will be joined and output.
(A hash join is faster than a nested loop join, which is discussed next,
because the hash value gives the location of the bucket to look in to match
the row from the second table with rows from the first table, and the entire
first table does not need to be scanned to find matches of a given row in
the second table.)
Source:
https://blogs.msdn.microsoft.com/craigfr/2006/08/10/hash-join/
As an example, return to the original Customers and Orders tables.
[text]
Nested Loops Join
A nested loops join can be used for any join, as it doesn’t require equijoin predicates
or ordered input sets. At the cost of this flexibility, it is often the slowest type of join.
Excluding indexes and any other optimizations, it requires comparing all rows in one of the tables
to all rows in the other table, and so the cost, excluding indexes or any other optimizations, is
proportional to the product the number of rows in the two tables. Pseudo-code is:
Source:
https://docs.microsoft.com/en-us/archive/blogs/craigfr/nested-loops-join
Example:
[text]
Aggregates
A stream aggregate is an operator used for implementing aggregate functions when the aggregate functions are used without a “group by” clause or when a “group by” clause is included and the “group by” columns are sorted/have an index on them. When the “group by” columns are not sorted, a hash aggregate is used.
Stream aggregate
Because the records are sorted when using a stream aggregate, SQL Server
can read the records once each, in order, and aggregate the desired columns
for each grouping of the “group by” columns. Psuedo-code is the following:
Source:
https://docs.microsoft.com/en-us/archive/blogs/craigfr/stream-aggregate
As an example of an aggregate without a “group by” clause:
[text]
The Compute Scalar operator in this example isn’t involved in counting the rows. The output of the
Stream Aggregate that flows to the Compute Scalar has the data type of bigint, and the Compute Scalar
operator just converts it from bigint to int. This can better be seen by looking at the text plan:
Next, as an example of an aggregate with a “group by” clause:
[text]
If there is a “group by” clause, the stream aggregate requires sorted data. As we have no index on OrderDate, SQL Server must use the Sort Operator to sort the data before passing it to the Stream Aggregate Operator.
If we add an index to the Orders table using the columns OrderDate and OrderID, we can rerun the
query and see that the Sort operator is no longer needed, because the data is now sorted due to the index:
[text]
Hash aggregate
Pseudo-code for hash aggregates is the following:
Source:
https://docs.microsoft.com/en-us/archive/blogs/craigfr/hash-aggregate
Add more rows to Order team (turn off “Show Execution Plan” while running the following):
[text]
Then execute:
[text]
A hash aggregate requires all rows in the build table to be stored in memory. Like with hash join, if there is not enough available memory, then extra rows must be stored in tempdb.
Other operators
Full list of operators available here .
Parallelization
Certain operators, such as hash joins, nested loop joins, and hash aggregates, are able to be run in parallel. For the graphical showplan, operators running in parallel will have a little symbol with two arrows on the operator icon. For more on parallelism, see here.
Hints
Typically, SQL Server will execute a given query in the most efficient manner. It is possible to direct SQL Server to execute a query in a different way than what it chooses on its own, using hints.
To direct SQL Server to use a certain type of join (i.e. Loop, Hash, or Merge), can use join hints. For more information, see here.
To direct SQL Server to use a certain index (along with specifying other behaviors), can use table hints. For more information, see here.
To direct SQL Server to join tables in the exact order as written, can use the query hint Force Order. For more information, see here.
It is also possible to provide an entire query plan in xml format that will be used instead of a system generated plan. See documenation here.